Analysis of optimal differential gene expression

نویسندگان

  • Wolfram Liebermeister
  • Jürgen Mlynek
چکیده

This thesis is concerned with the observation that coregulation patterns in gene expression data often reflect functional structures of the cell. First, simulated gene expression data and expression data from yeast experiments are studied with independent component analysis (ICA) and with related factor models. Then, in a more theoretical approach, relations between gene expression patterns and the biological function of the genes are derived from an optimality principle. Linear factor models such as ICA decompose gene expression matrices into statistical components. The coefficients with respect to the components can be interpreted as profiles of hidden variables (called “expression modes”) that assume different values in the different samples. In contrast to clusterings, such factor models account for a superposition of effects and for individual responses of the different genes: each gene profile consists of a superposition of the expression modes, which thereby account for the common variation of many genes. The components are estimated blindly from the data, that is, without further biological knowledge, and most of the methods considered here can reconstruct almost sparse components. Thresholding a component reveals genes that respond strongly to the corresponding mode, in comparison to genes showing differential expression among individual samples. In this work, different factor models are applied to simulated and experimental expression data. To simulate expression data, it is assumed that gene expression depends on several unobserved variables (“biological expression modes”) which characterise the cell state and that the genes respond to them according to nonlinear functions called “gene programs”. Is there a chance to reconstruct such expression modes with a blind data analysis? The tests in this work show that the modes can be found with ICA even if the data are noisy or weakly nonlinear, or if the numbers of true and estimated components do not match. Regression models are fitted to the profiles of single genes to explain their expression by expression modes from factor models or by the expression of single transcription factors. Nonlinear gene programs are estimated by nonlinear ICA: such effective gene programs may be used for describing gene expression in large cell models. ICA and similar methods are applied to expression data from cell-cycle experiments: besides biologically interpretable modes, experimental artefacts, probably caused by hybridisation effects and contamination of the samples, are identified. It is shown for single components that the coregulated genes share biological functions and the corresponding enzymes are concentrated in particular regions of the metabolism. Thus the expression machinery seems to portray as an outcome of evolution functional relationships between the genes: regarding the economy of resources, it would probably be inefficient if cooperating genes were not coregulated. To formalise this teleological view on gene expression, a mathematical model for the analysis of optimal differential expression (ANODE) is proposed in this work: the model describes regulators, such as genes or enzymes, and output variables, such as metabolic fluxes. The system ́s behaviour is evaluated by a fitness function, which, for instance, rates some of the metabolic fluxes in the cell and which is supposed to be optimised. This optimality principle defines an optimal response of regulators to small external perturbations. For calculating the optimal regulation patterns, the system to be controlled needs to be known only partially: it suffices to predefine its possible behaviour around the optimal state and the local shape of the fitness function. The method is extended to time-dependent perturbations: to describe the response of metabolic systems to small oscillatory perturbations, frequency-dependent control coefficients are defined and characterised by summation and connectivity theorems. For testing the predicted relation between expression and function, control coefficients are simulated for a large-scale metabolic network and their statistical properties are studied: the structure of the control coefficients matrix portrays the network topology, that is, chemical reactions tend to have little control on distant parts of the network. Furthermore, control coefficients within subnetworks depend only weakly on the modelling of the surrounding network. Several plausible assumptions about appropriate expression patterns can be formally derived from the optimality principle: the main result is a general relation between the behaviour of regulators and their biological functions, which implies, for example, the coregulation of enzymes acting in complexes or functional modules. In this context, the functions of genes are quantified by their linear influences (called “response coefficients”) on fitness-relevant cell variables. For enzymes controlling metabolism, the theorems of metabolic control theory lead to sum rules that relate the expression patterns to the structure of the metabolic network. Further predictions concern a symmetric compensation for gene deletions and a relation between gene expression and the fitness loss caused by gene deletions. If optimal regulation is realised by feedback signals between the cell variables and the regulators, then functional relations are also portrayed in the linear feedback coefficients, so genes of similar function may be expected to share inputs from the same signalling cascades. According to the model of optimal regulation, expression profiles are linear combinations of response coefficient profiles: tests with experimental expression profiles and simulated control coefficients support this hypothesis, and the common components which are estimated from both kinds of data provide a vivid picture of the metabolic adaptations that are required in different environments. To summarise, empirical relations between gene expression and function have been confirmed in this work. Furthermore, such relations have been predicted on theoretical ii grounds. A main aim is to clarify teleological assertions about gene expression by deriving them from explicit assumptions, and thus to provide a theoretical framework for the integration of expression data and functional annotations. While other authors have compared expression to functional gene categories or topologically defined metabolic pathways, I propose to relate it to the response coefficients. A main result of this work is that general relations are predicted between a gene’s function, its optimal expression behaviour, and its regulatory program. Where the assumption of optimality is valid, the model justifies the use of expression data for functional annotation and pathway reconstruction, and it provides a function-related interpretation for the linear components behind expression data. The methods from this work are not limited to gene expression data: the factor models are applicable to protein and metabolite data as well, and the optimality principle may also apply to other regulatory mechanisms, such as the allosteric control of enzymes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Global gene expression analysis using microarray to study differential vulnerability to neurodegeneration

Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...

متن کامل

Global gene expression analysis using microarray to study differential vulnerability to neurodegeneration

Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...

متن کامل

P-70: Evidence for Differential Gene Expression of A Major EpigeneticModifier Enzyme, de novo DNA Methyltransferase 3b, through Vitrification of Mouse Ovary Tissue

Background: Ovarian tissue cryopreservation is a feasible method to preserve female reproductive potential, especially in young patients with cancer or in women at risk of premature ovarian failure. Vitrification has recently emerged as a new trend for biological specimen preservation. On the other hand, gene expression that changes during vitrification can influence oocyte maturation and need ...

متن کامل

Differential genes expression analysis of invasive aspergillosis: a bioinformatics study based on mRNA/microRNA

Invasive aspergillosis is a severe opportunistic infection with high mortality in immunocompromised patients. Recently, the roles of microRNAs have been taken into consideration in the immune system and inflammatory responses. Using bioinformatics approaches, we aimed to study the microRNAs related to invasive aspergillosis to understand the molecular pathways involved in the disease pathogenes...

متن کامل

Gene Expression Profile Analysis during Mouse Tooth Development

Introduction: Complex molecular pathways involve in development of different tissues such as teeth. Differential gene expression patterns during teeth development generates different tooth types. Teeth development results from interactions between oral epithelium and underlying ectomesenchyme cells with neural crest origin. Teeth development are regulated by different signaling networks. In thi...

متن کامل

Regulatory effects of cis- and trans-LncRNAs on differential expression of genes following infection with viral hemorrhagic septicemia virus in rainbow trout (Oncorhynchus mykiss)

In this study the cis and trans regulatory effect of long non-coding genes (lncRNA) on the expression of genes in fish infected by Viral hemorrhagic septicemia virus (VHS) was investigated using RNA-seq technology. At the end of experimental period (the thirty fifth day), total RNA was extracted from spleen tissue (group treated with virus) and physiological serum (control group) was used to pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004